Discourse Segmentation in Aid of Document Summarization

نویسندگان

  • Branimir Boguraev
  • Mary S. Neff
چکیده

This paper describes work to enhance a sentencebased summarizer with notions of salience, dynamicallyadjustable summary size, discourse segmentation, and awareness of topic shifts. Our experiments study strategies to diversify the application of a baseline summarizer, by making it aware of finer-grained ‘aboutness’, capable of discerning changes of topic, and sensitive to longer-thanusual documents. Evaluated against the corpus used in the development of the baseline summarizer, summaries derived either by means of segmentation analysis alone, or by a mix of strategies for combining salience calculation and topic shift detection, are shown to be of comparable, and under certain conditions even better, quality. We describe the summarization and segmentation procedures, outline a number of strategies for mixing the two, evaluate the overall impact of discourse segmentation, and suggest an interface design capable of using the notion of topic shifts to contextualize a summary and facilitate the mediation between it and the full document source.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Thai News Text Summarization and Its Application

Since Thai language lacks word/phrase/sentence boundaries, document summarization in Thai needs investigations in unit segmentation, unit selection, redundancy removal and evaluation dataset construction. In this work, we have proposed Thai Elementary Discourse Unit (TEDU) and a three-stage method of Thai multidocument summarization, i.e., unit segmentation, unit-graph formulation, and unit sel...

متن کامل

Thai Multi-Document Summarization: Unit Segmentation, Unit-Graph Formulation, and Unit Selection

There have been several challenges in summarization of Thai multiple documents since Thai language itself lacks of explicit word/phrase/sentence boundaries. This paper gives definition of Thai Elementary Discourse Unit (TEDU) and then presents our three-stage summarization process. Towards implementation of this process, we propose unit segmentation using TEDUs and their derivatives, unitgraph ...

متن کامل

Lexical cohesion, discourse segmentation and document summarization

Summaries automatically derived by sentence extraction are known to exhibit some coherence degradation, readability deterioration, and topical under-representation. We propose a strategy for improving upon these problems, aiming to generate more cohesive summaries by analyzing the lexical cohesion factors in the source document texts. As an initial experiment, we have looked at one particular f...

متن کامل

Discourse Segmentation for Sentence Compression

Earlier studies have raised the possibility of summarizing at the level of the sentence. This simplification should help in adapting textual content in a limited space. Therefore, sentence compression is an important resource for automatic summarization systems. However, there are few studies that consider sentence-level discourse segmentation for compression task; to our knowledge, none in Spa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000